Design of Improved Web Crawler By Analysing Irrelevant Result

نویسندگان

Prashant Dahiwale

Latesh Malik

چکیده

A key issue in designing a focused Web crawler is how to determine whether an unvisited URL is relevant to the search topic. Effective relevance prediction can help avoid downloading and visiting many irrelevant pages. In this module, we propose a new learning-based approach to improve relevance prediction in focused Web crawlers. For this study, we chose Naïve Bayesian as the base prediction model, which however can be easily switched to a different prediction model. The performance of a focused crawler depends mostly on the richness of links in the specific topic being searched, and focused crawling usually relies on a general web search engine for providing starting points. Key Terms: URL; focused crawler; classifier; relevance prediction; links; search engine; ranking Full Text: http://www.ijcsmc.com/docs/papers/August2013/V2I8201356.pdf

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A Tool for Link-Based Web Page Classification

Virtual integration systems require a crawler to navigate through web sites automatically, looking for relevant information. This process is online, so whilst the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory to improve the crawler efficiency. Most crawlers need to download a page to d...

متن کامل

An Effective Focused Web Crawler for Web Resource Discovery

In the given volume of the Web and its speed of change, the coverage of modern search engines is relatively small. Web crawling is the process used by search engines to collect pages from the Web. Therefore, collecting domain-specific information from the Web is a special theme of research in many papers. In this paper, we introduce a new effective focused web crawler. It uses smart methods to ...

متن کامل

A Tool for Web Links Prototyping

Crawlers for Virtual Integration processes must be efficient, given that VI process is online, which means that while the system is looking for the required information, the user is waiting for a response. Therefore, downloading a minimum number of irrelevant pages is mandatory in order to improve the crawler efficiency. Most crawlers need to download a page in order the determine its relevance...

متن کامل

Reinforcement-Based Web Crawler

This paper presents a focused web crawler system which automatically creates a minority language corpora. The system uses a database of relevant and irrelevant documents testing the relevance of retrieved web documents. The system requires a starting web document to indicate where the search would begin.

متن کامل

Intelligent Web Navigation ?

Virtual integration systems retrieve information from several web applications according to a user’s interests. Being an online process, response time is a significant factor. Usually web pages contain a high number of links, some of them leading to interesting information, but most of them having other purposes, like advertising or internal site navigation. Traditional crawlers follow every li...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Design of Improved Web Crawler By Analysing Irrelevant Result

نویسندگان

چکیده

منابع مشابه

A Tool for Link-Based Web Page Classification

An Effective Focused Web Crawler for Web Resource Discovery

A Tool for Web Links Prototyping

Reinforcement-Based Web Crawler

Intelligent Web Navigation ?

عنوان ژورنال:

اشتراک گذاری